AITopics | reference feature

Collaborating Authors

reference feature

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MVSMamba: Multi-View Stereo with State Space Model

Neural Information Processing SystemsJun-22-2026, 21:47:58 GMT

Robust feature representations are essential for learning-based Multi-View Stereo (MVS), which relies on accurate feature matching. Recent MVS methods leverage Transformers to capture long-range dependencies based on local features extracted by conventional feature pyramid networks. However, the quadratic complexity of Transformer-based MVS methods poses challenges to balance performance and efficiency. Motivated by the global modeling capability and linear complexity of the Mamba architecture, we propose MVSMamba, the first Mamba-based MVS network. MVSMamba enables efficient global feature aggregation with minimal computational overhead. To fully exploit Mamba's potential in MVS, we propose a Dynamic Mamba module (DM-module) based on a novel referencecentered dynamic scanning strategy, which enables: (1) Efficient intra-and interview feature interaction from the reference to source views, (2) Omnidirectional multi-view feature representations, and (3) Multi-scale global feature aggregation. Extensive experimental results demonstrate MVSMamba outperforms state-of-theart MVS methods on the DTU dataset and the Tanks-and-Temples benchmark with both superior performance and efficiency.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.83)

Add feedback

Asymmetric Dual-Lens Video Deblurring

Neural Information Processing SystemsJun-11-2026, 19:32:34 GMT

Modern smartphones often feature asymmetric dual-lens systems, capturing wide-angle and ultra-wide views with complementary perspectives and details. Motion and shake can blur the wide lens, while the ultra-wide lens, despite lower resolution, retains sharper details. This natural complementarity offers valuable cues for video deblurring. However, existing methods focus mainly on single-camera inputs or symmetric stereo pairs, neglecting the cross-lens redundancy in mobile dual-camera systems. In this paper, we propose a practical video deblurring method, AsLeD-Net, which recurrently aligns and propagates temporal reference features from ultra-wide views fused with features extracted from wide-angle blurry frames. AsLeD-Net consists of two key modules: the adaptive local matching (ALM) module, which refines blurry features using $K$-nearest neighbor reference features, and the difference compensation (DC) module, which ensures spatial consistency and reduces misalignment. Additionally, AsLeD-Net uses the reference-guided motion compensation (RMC) module for temporal alignment, further improving frame-to-frame consistency in the deblurring process.

artificial intelligence, machine learning, neural information processing system 38, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.43)

Add feedback

ContextDrag: Precise Drag-Based Image Editing via Context-Preserving Token Injection and Position-Consistent Attention

He, Huiguo, Yan, Pengyu, Yi, Ziqi, Zhong, Weizhi, Liu, Zheng, Tang, Yejun, Yang, Huan, Gai, Kun, Li, Guanbin, Jin, Lianwen

arXiv.org Artificial IntelligenceDec-10-2025

Drag-based image editing aims to modify visual content followed by user-specified drag operations. Despite existing methods having made notable progress, they still fail to fully exploit the contextual information in the reference image, including fine-grained texture details, leading to edits with limited coherence and fidelity. To address this challenge, we introduce ContextDrag, a new paradigm for drag-based editing that leverages the strong contextual modeling capability of editing models, such as FLUX-Kontext. By incorporating VAE-encoded features from the reference image, ContextDrag can leverage rich contextual cues and preserve fine-grained details, without the need for finetuning or inversion. Specifically, ContextDrag introduced a novel Context-preserving Token Injection (CTI) that injects noise-free reference features into their correct destination locations via a Latent-space Reverse Mapping (LRM) algorithm. This strategy enables precise drag control while preserving consistency in both semantics and texture details. Second, ContextDrag adopts a novel Position-Consistent Attention (PCA), which positional re-encodes the reference tokens and applies overlap-aware masking to eliminate interference from irrelevant reference features. Extensive experiments on DragBench-SR and DragBench-DR demonstrate that our approach surpasses all existing SOTA methods. Code will be publicly available.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2512.08477

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry: Media > Photography (0.62)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback

ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images

Kong, Xianghao, Qi, Qiaosong, Wang, Yuanbin, Rao, Anyi, Chen, Biaolong, Zhang, Aixi, Liu, Si, Jiang, Hao

arXiv.org Artificial IntelligenceMay-13-2025

Fashion video generation aims to synthesize temporally consistent videos from reference images of a designated character. Despite significant progress, existing diffusion-based methods only support a single reference image as input, severely limiting their capability to generate view-consistent fashion videos, especially when there are different patterns on the clothes from different perspectives. Moreover, the widely adopted motion module does not sufficiently model human body movement, leading to sub-optimal spatiotemporal consistency. To address these issues, we propose ProFashion, a fashion video generation framework leveraging multiple reference images to achieve improved view consistency and temporal coherency. To effectively leverage features from multiple reference images while maintaining a reasonable computational cost, we devise a Pose-aware Prototype Aggregator, which selects and aggregates global and fine-grained reference features according to pose information to form frame-wise prototypes, which serve as guidance in the denoising process. To further enhance motion consistency, we introduce a Flow-enhanced Prototype Instantiator, which exploits the human keypoint motion flow to guide an extra spatiotemporal attention process in the denoiser. To demonstrate the effectiveness of ProFashion, we extensively evaluate our method on the MRFashion-7K dataset we collected from the Internet. ProFashion also outperforms previous methods on the UBC Fashion dataset.

artificial intelligence, machine learning, reference image, (18 more...)

arXiv.org Artificial Intelligence

2505.06537

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

DreamCache: Finetuning-Free Lightweight Personalized Image Generation via Feature Caching

Aiello, Emanuele, Michieli, Umberto, Valsesia, Diego, Ozay, Mete, Magli, Enrico

arXiv.org Artificial IntelligenceNov-26-2024

Personalized image generation requires text-to-image generative models that capture the core features of a reference subject to allow for controlled generation across different contexts. Existing methods face challenges due to complex training requirements, high inference costs, limited flexibility, or a combination of these issues. In this paper, we introduce DreamCache, a scalable approach for efficient and high-quality personalized image generation. By caching a small number of reference image features from a subset of layers and a single timestep of the pretrained diffusion denoiser, DreamCache enables dynamic modulation of the generated image features through lightweight, trained conditioning adapters. DreamCache achieves state-of-the-art image and text alignment, utilizing an order of magnitude fewer extra parameters, and is both more computationally effective and versatile than existing models.

artificial intelligence, dreamcache, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2411.17786

Country:

North America > United States (0.14)
Asia > Middle East > Saudi Arabia > Northern Borders Province > Arar (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

FreeEdit: Mask-free Reference-based Image Editing with Multi-modal Instruction

He, Runze, Ma, Kai, Huang, Linjiang, Huang, Shaofei, Gao, Jialin, Wei, Xiaoming, Dai, Jiao, Han, Jizhong, Liu, Si

arXiv.org Artificial IntelligenceSep-26-2024

Introducing user-specified visual concepts in image editing is highly practical as these concepts convey the user's intent more precisely than text-based descriptions. We propose FreeEdit, a novel approach for achieving such reference-based image editing, which can accurately reproduce the visual concept from the reference image based on user-friendly language instructions. Our approach leverages the multi-modal instruction encoder to encode language instructions to guide the editing process. This implicit way of locating the editing area eliminates the need for manual editing masks. To enhance the reconstruction of reference details, we introduce the Decoupled Residual ReferAttention (DRRA) module. This module is designed to integrate fine-grained reference features extracted by a detail extractor into the image editing process in a residual way without interfering with the original self-attention. Given that existing datasets are unsuitable for reference-based image editing tasks, particularly due to the difficulty in constructing image triplets that include a reference image, we curate a high-quality dataset, FreeBench, using a newly developed twice-repainting scheme. FreeBench comprises the images before and after editing, detailed editing instructions, as well as a reference image that maintains the identity of the edited object, encompassing tasks such as object addition, replacement, and deletion. By conducting phased training on FreeBench followed by quality tuning, FreeEdit achieves high-quality zero-shot editing through convenient language instructions. We conduct extensive experiments to evaluate the effectiveness of FreeEdit across multiple task types, demonstrating its superiority over existing methods. The code will be available at: https://freeedit.github.io/.

editing, instruction, reference image, (15 more...)

arXiv.org Artificial Intelligence

2409.18071

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > Austria (0.04)

Genre: Research Report > Promising Solution (0.48)

Industry: Media > Photography (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

CTBench: A Comprehensive Benchmark for Evaluating Language Model Capabilities in Clinical Trial Design

Neehal, Nafis, Wang, Bowen, Debopadhaya, Shayom, Dan, Soham, Murugesan, Keerthiram, Anand, Vibha, Bennett, Kristin P.

arXiv.org Artificial IntelligenceJun-25-2024

CTBench is introduced as a benchmark to assess language models (LMs) in aiding clinical study design. Given study-specific metadata, CTBench evaluates AI models' ability to determine the baseline features of a clinical trial (CT), which include demographic and relevant features collected at the trial's start from all participants. These baseline features, typically presented in CT publications (often as Table 1), are crucial for characterizing study cohorts and validating results. Baseline features, including confounders and covariates, are also necessary for accurate treatment effect estimation in studies involving observational data. CTBench consists of two datasets: "CT-Repo," containing baseline features from 1,690 clinical trials sourced from clinicaltrials.gov, and "CT-Pub," a subset of 100 trials with more comprehensive baseline features gathered from relevant publications. Two LM-based evaluation methods are developed to compare the actual baseline feature lists against LM-generated responses. "ListMatch-LM" and "ListMatch-BERT" use GPT-4o and BERT scores (at various thresholds), respectively, for evaluation. To establish baseline results, advanced prompt engineering techniques using LLaMa3-70B-Instruct and GPT-4o in zero-shot and three-shot learning settings are applied to generate potential baseline features. The performance of GPT-4o as an evaluator is validated through human-in-the-loop evaluations on the CT-Pub dataset, where clinical experts confirm matches between actual and LM-generated features. The results highlight a promising direction with significant potential for improvement, positioning CTBench as a useful tool for advancing research on AI in CT design and potentially enhancing the efficacy and robustness of CTs.

baseline feature, dataset, gpt-4o, (17 more...)

arXiv.org Artificial Intelligence

2406.17888

Country:

North America > United States > New York > Rensselaer County > Troy (0.04)
North America > United States > New York > Albany County > Albany (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

BootPIG: Bootstrapping Zero-shot Personalized Image Generation Capabilities in Pretrained Diffusion Models

Purushwalkam, Senthil, Gokul, Akash, Joty, Shafiq, Naik, Nikhil

arXiv.org Artificial IntelligenceJan-25-2024

Recent text-to-image generation models have demonstrated incredible success in generating images that faithfully follow input prompts. However, the requirement of using words to describe a desired concept provides limited control over the appearance of the generated concepts. In this work, we address this shortcoming by proposing an approach to enable personalization capabilities in existing text-to-image diffusion models. We propose a novel architecture (BootPIG) that allows a user to provide reference images of an object in order to guide the appearance of a concept in the generated images. The proposed BootPIG architecture makes minimal modifications to a pretrained text-to-image diffusion model and utilizes a separate UNet model to steer the generations toward the desired appearance. We introduce a training procedure that allows us to bootstrap personalization capabilities in the BootPIG architecture using data generated from pretrained text-to-image models, LLM chat agents, and image segmentation models. In contrast to existing methods that require several days of pretraining, the BootPIG architecture can be trained in approximately 1 hour. Experiments on the DreamBooth dataset demonstrate that BootPIG outperforms existing zero-shot methods while being comparable with test-time finetuning approaches. Through a user study, we validate the preference for BootPIG generations over existing methods both in maintaining fidelity to the reference object's appearance and aligning with textual prompts.

architecture, bootpig, reference image, (14 more...)

arXiv.org Artificial Intelligence

2401.13974

Country:

Asia > Middle East > Saudi Arabia > Northern Borders Province > Arar (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > China > Heilongjiang Province > Daqing (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Skin feature point tracking using deep feature encodings

Chang, Jose Ramon, Nordling, Torbjörn E. M.

arXiv.org Artificial IntelligenceDec-28-2021

Facial feature tracking is a key component of imaging ballistocardiography (BCG) where accurate quantification of the displacement of facial keypoints is needed for good heart rate estimation. Skin feature tracking enables video-based quantification of motor degradation in Parkinson's disease. Traditional computer vision algorithms include Scale Invariant Feature Transform (SIFT), Speeded-Up Robust Features (SURF), and Lucas-Kanade method (LK). These have long represented the state-of-the-art in efficiency and accuracy but fail when common deformations, like affine local transformations or illumination changes, are present. Over the past five years, deep convolutional neural networks have outperformed traditional methods for most computer vision tasks. We propose a pipeline for feature tracking, that applies a convolutional stacked autoencoder to identify the most similar crop in an image to a reference crop containing the feature of interest. The autoencoder learns to represent image crops into deep feature encodings specific to the object category it is trained on. We train the autoencoder on facial images and validate its ability to track skin features in general using manually labeled face and hand videos. The tracking errors of distinctive skin features (moles) are so small that we cannot exclude that they stem from the manual labelling based on a $\chi^2$-test. With a mean error of 0.6-4.2 pixels, our method outperformed the other methods in all but one scenario. More importantly, our method was the only one to not diverge. We conclude that our method creates better feature descriptors for feature tracking, feature matching, and image registration than the traditional algorithms.

bike condition, neighbor, sift, (16 more...)

arXiv.org Artificial Intelligence

2112.14159

Country:

Europe > Sweden > Västerbotten County > Umeå (0.04)
Oceania > New Zealand (0.04)
North America > United States > Tennessee > Knox County > Knoxville (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reference Distance Estimator

Li, Yanpeng

arXiv.org Machine LearningAug-17-2013

Abstract: A theoretical study is presented for a simple linear classifier called reference distance estimator (RDE), which assigns the weight of each feature j as P(r j)-P(r), where r is a reference feature relevant to the target class y. The analysis shows that if r performs better than random guess in predicting y and is conditionally independent with each feature j, the RDE will have the same classification performance as that from P(y j)-P(y), a classifier trained with the gold standard y. Since the estimation of P(r j)-P(r) does not require labeled data, under the assumption above, RDE trained with a large number of unlabeled examples would be close to that trained with infinite labeled examples. For the case the assumption does not hold, we theoretically analyze the factors that influence the closeness of the RDE to the perfect one under the assumption, and present an algorithm to select reference features and combine multiple RDEs from different reference features using both labeled and unlabeled data. The experimental results on 10 text classification tasks show that the semi-supervised learning method improves supervised methods using 5,000 labeled examples and 13 million unlabeled ones, and in many tasks, its performance is even close to a classifier trained with 13 million labeled examples. In addition, the bounds in the theorems provide good estimation of the classification performance and can be useful for new algorithm design.

artificial intelligence, machine learning, reference feature, (19 more...)

arXiv.org Machine Learning

1308.3818

Country: Asia > China (0.14)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback